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Peer-to-peer (P2P) locality has recently raised a lot of interest in the community. Indeed, whereas P2P content distribution enables 
financial savings for the content providers, it dramatically increases the traffic on inter-ISP links. 

To solve this issue, the idea to keep a fraction of the P2P traffic local to each ISP was introduced a few years ago. Since then, 
P2P solutions exploiting locality have been introduced. However, several fundamental issues on locality still need to be explored. 
In particular, how far can we push locality, and what is, at the scale of the Internet, the reduction of traffic that can be achieved with 
locality? 

In this paper, we perform extensive experiments on a controlled environment with up to 10 000 BitTorrent clients to evaluate the 
impact of high locality on inter-ISP links traffic and peers download completion time. 

■ We introduce two simple mechanisms that make high locality possible in challenging scenarios and we show that we save 
up to several orders of magnitude inter-ISP traffic compared to traditional locality without adversely impacting peers download 
completion time. In addition, we crawled 214 443 torrents representing 6113 224 unique peers spread among 9 605 ASes. We show 
that whereas the torrents we crawled generated 11.6 petabytes of inter-ISP traffic, our locality policy implemented for all torrents 
' could have reduced the global inter-ISP traffic by up to 40%. 
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1. Introduction 

Content distribution is today at the core of the services pro- 
vided by the Internet. However, distributing content to a large 
audience is costly with a classical client-server or CDN solu- 
tion. This is the reason why content providers start to move 
to P2P content distribution that enables to significantly reduce 
their cost without penalizing the experience of users. One strik- 
ing example is Murder, a BitTorrent extension to update the 
Twitter infrastructure. 

However, whereas current P2P content distribution solutions 
like BitTorrent are very efficient, they generate a huge amount 
of traffic on inter-ISP links Qj]. Indeed, in BitTorrent, each peer 
that downloads a given content is connected to a small subset of 
peers picked at random among all the peers that download that 
content. In fact, even though peers in the same ISP are down- 
loading the same content they are not necessarily connected to 
each other. As a consequence, peers unnecessarily download 
most of the content from peers located outside of their ISP. 

Therefore, even if current P2P content replication solutions 
significantly reduce content provider costs, they cannot be pro- 
moted as a global solution for content replication as they induce 
huge costs for ISPs. In particular, the current trend for ISPs is 
to block P2P traffic 0]. 

One solution to this problem is to use P2P locality. The goal 
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of P2P locality is to constrain P2P traffic within ISPs' bound- 
aries in order to minimize the amount of inter-ISP traffic. 

The seminal work of Karagiannis et al. 0_| is the first one 
to suggest the use of locality in a P2P system in order to re- 
duce the load on inter-ISP links. They show on real traces the 
potential for locality (in particular spatial and temporal correla- 
tion in the requests for contents) and, based on simulation on a 
BitTorrent tracker log, they evaluate the benefit of several archi- 
tectures and in particular a P2P architecture exploiting locality. 
More recently, Xie et al. yfl proposed P4P, an architecture to 
enable cooperation between P2P applications and ISPs. They 
show by performing large field tests that P4P enables reduction 
of external traffic for a monitored ISP and enables a reduction 
on the peers download completion time. Choffnes et al. J3] 
proposed Ono, a BitTorrent extension that leverages on a CDN 
infrastructure to localize peers in order to group peers that are 
close to each other. They show the benefit of Ono in terms of 
peers download completion time and that Ono can reduce the 
number of IP hops and AS hops among peers. 

With those works, there is no doubt that P2P locality has 
some benefits and that there are several ways to implement it. 
However, two fundamental questions are left unanswered by 
those previous works. 

How far can we push locality? In all proposed solutions the 
number of inter-ISP connections is kept high enough to guaran- 
tee a good robustness to partitions, i.e., a lack of connectivity 
among set of peers resulting in a poor download completion 
time. However, this robustness is at the expense of a larger 
inter-ISP traffic. How far can we push locality without impact- 
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ing the robustness to partition of the P2P protocol? 

What is, at the scale of the Internet, the reduction of traf- 
fic that can be achieved with locality? It might be argued that 
P2P locality will bring little benefits at the scale of the Internet. 
Indeed, in case most ISPs have just a few peers, there will be 
little inter-ISP traffic reduction by keeping the traffic local to 
those ISPs. Therefore, the question is, what is the distribution 
of peers per ISP in the Internet, and what would be the inter-ISP 
bandwidth savings achieved with a locality policy. Previous 
works looking at inter-ISP bandwidth savings either consider 
indirect measurements (like the distribution of the number of 
AS between any two peers with a direct connection partial 
measurements (like the monitoring of a specific ISP), or sim- 
ulations (like comparing various content distribution scenarios 
based on the location of peers obtained from a tracker log). For 
instance, Xie et al. J3t] reported results on inter-ISP savings with 
P4P for a single ISP. 

The answers to those questions will be fundamental when 
P2P content replication will be used by content providers for 
large scale distribution. In that case, it is likely that ISPs will 
need to know the amount of inter-ISP traffic they can save with 
locality, and that they will request content providers to mini- 
mize this traffic due to P2P applications accordingly. At the 
same time, the content providers will need a clear understand- 
ing of the impact of this reduction of traffic on their customers. 

Our contribution in this paper is to answer those questions by 
running extensive large scale BitTorrent experiments (with up 
to 10000 real BitTorrent clients) in a controlled environment, 
and by using real data we crawled in the Internet on 214 443 tor- 
rents representing 6113 224 unique peers spread among 9 605 
ASes. Our work can be summarized with the two following key 
contributions. 

i) We show that we can push BitTorrent locality much further 
than what was previously proposed, which enables to reduce by 
several orders of magnitude the inter-ISP traffic and to keep the 
peers download completion time low. In particular, we show 
on experiments including real world data that the reduction of 
inter-ISP traffic and the peers download completion time are not 
significantly impacted by the torrent size, the number of peers 
per ISP, and the churn. Finally, we propose new strategies to 
improve the efficiency and robustness of our locality policy on 
challenging scenarios defined from real world torrents. 

ii) We show that at the scale of the 214443 torrents we 
crawled, ISPs can largely benefit from locality. In particular, 
whereas all the torrents crawled generated 11.6 petabytes of 
inter-ISP traffic, high locality could have saved up to 40%, i.e., 
4.6 petabytes, of inter-ISP traffic. This result is significantly 
different from the inter-ISP bandwidth savings reported by Xie 
et al. 0]. Indeed, they reported a reduction of inter-ISP traffic 
with P4P around 60%, but for a single ISP with a single large 
torrent. Thus, they did not evaluate the reduction of BitTorrent 
traffic at the scale of the Internet, but for a single ISP. The re- 
sult we report is an estimation for 214443 real torrents spread 
among 9 605 ASes, thus capturing the variety of torrent sizes 
and distribution of peers per AS we can find in the Internet. 

The remaining of this paper is organized as follows. We de- 
fine the locality policy we use for our evaluation in section [2] 



then we describe our experimental setup, and define metrics in 
section [3] We discuss the impact of the number of inter-ISP 
connections in section [4] and focus on a small number of inter- 
ISP connections in section [5] In section [6] we present results 
obtained from a large crawl of torrents in the Internet. In sec- 
tion |71 we discuss the related work. Finally, we conclude in 
section [8] 

2. Locality Policy 

In this paper, we make an experimental evaluation of the two 
questions discussed in the introduction. To do so, we introduce 
in the following a locality policy that we use to perform our 
evaluation. We do not claim our locality policy to be a defini- 
tive solution that should be deployed. Instead, it is a simple 
implementation that we used for our evaluation. Yet, we iden- 
tified two important strategies that we recommend to consider, 
even in a modified form, for any implementation of a locality 
policy. 

In the following, we refer to BitTorrent policy when the 
tracker does not implement our locality policy, but the regular 
random policy. 

2.1. Implementation of the Locality Policy 

We say that a connection is inter-ISP when two peers in two 
different ISPs have established a direct BitTorrent connection, 
and that it is intra-ISP when the two peers are from the same 
ISP. The goal of our locality policy is to limit the number of 
inter-ISP connections, the higher the locality, the smaller the 
number of inter-ISP connections. 

We say that an inter-ISP connection is outgoing (resp. in- 
coming) for an ISP if the connection was initiated by a peer 
inside (resp. outside) this ISP. However, once a connection is 
established it is fully bidirectional. 

In order to control the number of inter-ISP connections, we 
assume that the tracker can map each peer to its ISP. How this 
mapping is performed is orthogonal to our work. For instance, 
the tracker can simply map peers to ASes using precomputed 
mapping information obtained from BGP tables (5J] . In case the 
AS level is not appropriate for ISPs, the tracker can use more 
sophisticated information as the one offered by, for instance, the 
P4P infrastructure |Q]. 

The only one parameter of our locality policy is the maximum 
number of outgoing inter-ISP connections per ISP. The tracker 
maintains for each ISP the number of peers outside this ISP that 
it returned to peers inside, along with the identity of the peers 
inside. This way the tracker maintains a reasonable approxima- 
tion of the number of outgoing inter-ISP connections for each 
ISP. When a peer P asks the tracker for a new list of peers, the 
tracker will: map this peer to the ISP / ; , it belongs to; return 
to this peer a list of peers inside I p ; if the maximum number of 
outgoing inter-ISP connections per ISP is not yet reached for I p , 
return one additional peer P„ outside I p and increment by one 
the counter of the number of outgoing connections for I p . We 
also add a randomization factor to distribute the outgoing con- 
nections evenly among the peers of each ISP, in order to avoid a 
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single point of failure in case the peer receiving those outgoing 
connections for a given ISP decides to leave. 

Each regular BitTorrent client contacts periodically, typically 
every 30 minutes, the tracker to return statistics. Each time a 
peer leaves the torrent, it contacts the tracker so that it can re- 
move this peer from the list of peers in the torrent. In case the 
client does not contact the tracker when it leaves (for instance, 
due to a crash of the client), the tracker will automatically re- 
move the peer after a predefined period of time, typically 45 
minutes, after the last connection of the peer to the tracker. Our 
locality policy uses this information to maintain an up-to-date 
list of the number of outgoing inter-ISP connections per ISP. 

When the tracker implements our locality policy, it applies 
the locality policy to all peers except the initial seed. Because 
the goal of the initial seed is to improve diversity, the tracker 
selects the neighbors of the initial seed using the BitTorrent 
policy. However, we apply the locality policy to all the other 
seeds, that is all the leechers that become seed during the ex- 
periments. Note that the traffic generated by the initial seed 
is negligible compared to the aggregated traffic of the torrent, 
when the torrent is large enough. 

2.2. Round Robin Strategy 

Our locality policy controls the number of outgoing inter- 
ISP connections per ISP. When a peer P from the ISP I p opens 
a new connection to a peer P„ from the ISP I Po , the connection 
is outgoing for I p , but incoming for l Po . As both outgoing and 
incoming connections account for the total number of inter-ISP 
connections, it is important to define a strategy for the selection 
of peer P„ returned by the tracker to peer P. 

We define two strategies to select this peer P . In the first 
strategy, the default one, P„ is selected at random among all 
peers outside I p . While this strategy is straightforward, it has 
the notable drawback that the largest ISPs have a higher proba- 
bility to have a peer selected than other ones. Therefore, large 
ISPs will have more incoming connections than small ones. 
Thus, it is likely that in this case, as connections are bidirec- 
tional, the inter-ISP traffic will be higher for large ISPs (we 
confirm this intuition in section |6\2| |. In the second strategy that 
we call Round Robin (RR), the tracker selects first the ISP with 
a round robin policy and then selects a peer at random in the se- 
lected ISP. This way, the probability to select a peer in a given 
ISP is independent of the size of this ISP. 

In scenarios with a same number of peers for each ISP, both 
strategies are equivalent. Therefore, as all the experiments in 
section [4] and consider a homogeneous number of peers for 
each ISP, we only present the results with the default strat- 
egy. We perform a detailed evaluation of the RR strategy in 
section [6] 

2.3. Partition Merging Strategy 

One issue with a small number of inter-ISP connections is 
the higher probability to have partitions in the torrent. Indeed, 
if peers who have inter-ISP connections leave the torrent and 
no new peer joins the ISP, then this ISP will form a partition. 
In order to repair partitions we introduce an additional strategy 



called Partition Merging (PM) strategy. The problem of parti- 
tion in BitTorrent is not specific to our locality policy, but any 
locality policy favors its apparition. 

The implementation of the Partition Merging strategy is the 
following. On the client side, each leecher monitors the pieces 
received by all its neighbors using the regular BitTorrent HAVE 
messages. If during a period of time randomly selected in 
[0, T], with T initialized to To, the leecher cannot find any piece 
it needs among all its neighbors (i.e., each neighbor has a sub- 
set of the pieces of the leecher), it recontacts the tracker with a 
PM flag, which means that the leecher believes there is a par- 
tition and that it needs a connection to a new peer outside its 
ISP. In case of large torrents, there might be an implosion of 
requests at the tracker. This issue, known as the feedback im- 
plosion problem in the literature, has been extensively studied 
and can be solved using several techniques However, a de- 
tailed description of a feedback implosion mechanism for the 
PM strategy is beyond the scope of this paper. 

On the tracker side, the tracker maintains for each ISP a flag 
that indicates whether it answered a request from a peer with 
the PM flag within the last T\ minutes, i.e., the tracker returned 
to a peer of this ISP a peer outside. The tracker will return at 
most one peer outside each ISP every T\ minutes in order to 
avoid exploiting this strategy to bypass the locality policy. 

The detailed evaluation of the impact of the initial values of 
the timers is beyond the scope of this study. The choice of the 
values is a tradeoff between reactivity and erroneous detection 
of partitions. In this study, we set Tq and T\ to one minute, 
and we show that it efficiently detects partitions without signif- 
icantly increasing the inter-ISP traffic. 

This strategy might be abused by an attacker. Indeed, as the 
PM strategy detects partitions relying on the accuracy of the 
HAVE messages sent by neighbors, an attacker might generate 
dummy HAVE to prevent peers of an ISP to detect a partition. 
However, this is not an issue in the context of our study, as 
we work on a controlled environment, without attackers. In 
addition, we don't believe this is a major issue for the following 
two reasons. First, an attacker must be a neighbor of all the 
peers of an ISP to attack it. However, with the locality policy, 
the attacker must be in the ISP it wants to attack, otherwise it 
has a very low probability to become one of the ISP's peers 
neighbor. That makes the attack hard to deploy at the scale of a 
torrent. Second, instead of relying on the monitoring of HAVE 
messages, a peer can rely on pieces it receives. For instance, 
a peer can combine the current PM strategy with the additional 
criterion that it also generates a PM request to the tracker in case 
it does not receive any new piece within a 5 minutes interval. It 
is beyond the scope of this study to perform a detailed analysis 
of variations of the PM strategy, which has to be addressed in 
future work. 

As this strategy has no impact on our experiments when there 
is no partition, we present results in section |4] and [5] without 
the PM strategy unless explicitly specified, that is when there 
is a partition and that the PM strategy changes the result. We 
perform a detailed analysis of the PM strategy in section [6] 
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2.4. Granularity of the Notion of Locality 

Our locality policy is designed to keep traffic local to ISPs. 
However, we are not restricted to ISPs, and our locality pol- 
icy can keep traffic local to any network region as long as the 
tracker is aware of the regions and has a means to map peers 
to those regions. For instance, a tracker can use information 
offered by a dedicated infrastructure like the P4P infrastructure 
13]. In particular, when we focus on real world scenarios in 
section[6] we will use ASes instead of ISPs. 

3. Methodology 

In this section, we describe our experimental setup, and the 
metrics that we consider to evaluate our experiments. 

3.1. Experimental Setup 

In this paper, we have run large scale experiments to evaluate 
the impact of our locality policy on inter-ISP traffic and Bit- 
Torrent download completion time. We have run experiments 
instead of simulations for two main reasons. First, it is hard to 
run realistic (packet level discrete) P2P simulations with more 
than a few thousand of peers due to the large state generated 
by each peer and the packets in transit on the links. Moreover, 
at that scale, simulations are often slower than real time. Sec- 
ond, the dynamics of BitTorrent is subtle and not yet deeply 
understood. Running simulations with a simplified version of 
BitTorrent may hide fundamental properties of the system. 

In the following, we consider experiments with up to 10000 
peers. There are two main reasons for the choice of this scale. 
First, torrents in the order of 10000 peers are considered large 
torrents. As one of the important questions we answer in this 
paper is What is, at the scale of the Internet, the reduction of 
traffic that can be achieved with locality?, we deemed impor- 
tant to experiments with torrents that are considered to be large 
today. In particular, large torrents have a unique distribution 
of peers per AS. By running large scale experiments, we were 
able to evaluate the impact of realistic peer distribution per AS 
in section [6] Second, the maximum peer set size on the most 
popular BitTorrent clients is between 50 and 100 (it is 80 in 
this paper). Considering the impact of locality for torrents in 
the order of a few hundreds of peers would have biased favor- 
ably our results. Indeed, a locality policy can be considered as 
a constraint on the connectivity graph among peers. Consider- 
ing small torrents with a large peer set would have artificially 
improved the connectivity among peers. For this reason large 
torrents in the order of 10 000 peers represent a more challeng- 
ing and convincing scenario. 

As we will see during the presentation of our results, we ob- 
serve behaviors that can only be pointed out using real experi- 
ments at large scale, with up to 10 000 peers. 

We now describe the experimentation platform on which we 
run all our experiments, the BitTorrent client that we use in our 
experiments, and how we simulate an inter-ISP topology on top 
of the platform. 



3.1.1. Platform 

We obtain all our results by running large scale experiments 
with a real BitTorrent client. 

We run all our experiments on a dedicated experimentation 
platform. A typical node in this platform has bi or quad-core 
AMD Opteron CPU, 2 to 4GB of memory, and a gigabit Ether- 
net connectivity. The platform we used consists of 178 nodes. 
Once a set of nodes is reserved, no other experiment can run on 
parallel on those nodes. In particular, there is no virtualization 
on those nodes. Therefore, experiments are totally controlled 
and reproducible. We use on the nodes a Linux kernel 2.6.18 
that allows a much larger number of simultaneous opened files 
or TCP connections than what we use in our experiments. 

The BitTorrent client used for our experiments is an instru- 
mented version of the mainline client 17J], which is based on 
version 4.0.2 of the official client This instrumented client 
can log specific messages received and sent. Unless specified 
otherwise, we use the default parameters of this client. In par- 
ticular, each peer uploads at 20kB/s to 4 other peers, and the 
maximum peer set size is 80. We will vary the upload capacity 
when studying the impact of heterogeneous upload capacities 
in section [4] (see section 14.11 for a description of our hetero- 
geneous scenario). We also use the choke algorithm in seed 
state of the official client in its version 4.0.2. This algorithm is 
somewhat different, as it is fairer and more robust than the one 
implemented in most BitTorrent clients today. However, as it 
only impacts the seed, we do not believe this algorithm to have 
a significant impact on our results. 

Our client does not implement a gossiping strategy to dis- 
cover peers, like Peer Exchange (PEX) used in the Vuze client. 
It is easy to make PEX locality aware. For instance, peer A 
must only send using PEX neighbors to peer B that are within 
the same ISP as peer B. However, it is beyond the scope of this 
study to make a detailed discussion of this issue. 

We use the following default parameters for our experiments, 
unless otherwise specified. Peers share a content of 100MB that 
is split into pieces of 256kB. By default, all peers including the 
initial seed start within the first 60 seconds of the experiments. 
However, we will also vary this parameter in section l6~.2.2l when 
studying the impact of churn (see section l6~.2.2l for a description 
of our scenario with churn). Once a leecher has completed its 
download, it stays 5 minutes as seed and then leaves the torrent. 
We have chosen 5 minutes because it is long enough to give 
enough time for peers to upload the last pieces they have down- 
loaded before becoming a seed. However, 5 minutes is short 
enough to do not artificially increase the capacity of service 
of the torrent, as 5 minutes is small compared to the optimal 
download completion time (83 minutes). The initial seed stays 
connected for the entire duration of the experiment. 

We run all our experiments with up to 100 BitTorrent clients 
per physical node. Therefore, for torrents with 100, 1 000, and 
10000 peers, we use respectively 1, 10, and 100 nodes for the 
leechers, plus one node for the seed and the tracker. Each client 
on a same node uses a different port to allow communication 
among those clients. We have performed a benchmarking test 
to find how many clients we can run on a single node with- 
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out a performance penalty that we identify with a decrease in 
the client download time for a reference content of 100MB. 
We have found that we can run up to 150 clients uploading 
at 20KB/s on a single node without performance penalty. To 
be safe, we run no more than 100 clients uploading at 20kB/s 
on one node, or 2MB/s of BitTorrent workload. When we vary 
the upload capacity of clients in section [4] we will then adapt 
the number of clients per node so that the aggregated upload 
capacity per node is never beyond 2MB/s. 

3.1.2. Inter-ISPs Topology 

We remind that our goal is to evaluate the impact of the num- 
ber of inter-ISP connections on inter-ISP traffic and peers per- 
formance. Therefore, we simulated an inter-ISP topology on 
top of the experimentation platform we use to run our exper- 
iments. We explain, in the following, how we simulated this 
topology and how representative it is of the real Internet. 

For all our experiments, we assume that we have a set of 
stub-ISPs that can communicate among each other. On top of 
this topology, we consider two scenarios. The first scenario 
is when all stub-ISPs have a single peering link to each other, 
thus the topology of the network is a full mesh. We refer to a 
peering link as a link for which an ISP does not pay for traffic. 
However, the peering technology is expensive to upgrade so 
ISPs are interested in reducing the load on those links. Indeed, 
the cost to upgrade the capacity of the equipments increases 
much faster than the capacity. The second scenario is when 
each stub-ISP is connected with a transit link to a single transit- 
ISP. All peers are in stub-ISPs. Therefore, there is no traffic 
with a source or a destination in the transit-ISP. We refer to a 
transit link as a link on which traffic is billed according to the 
95th percentile. Therefore, ISPs are interested in reducing the 
bursts of traffic on those links. 

We observe that both scenarios are simply a different inter- 
pretation of a same experiment, as all peers are in stub-ISPs and 
the traffic flows from one stub-ISP to another one. In the fol- 
lowing, we refer to inter-ISP link when our discussion applies 
to both peering and transit links. 

In our experiments, the notion of ISPs and inter-ISP links is 
virtual, as we run all our experiments on an experimentation 
platform. To simulate the presence of a peer in a given ISP, 
before each experiment, we create a static mapping between 
peers and ISPs. We use this mapping to compute offline the 
traffic that is uploaded on each inter-ISP link of the stub-ISPs. 
For instance, imagine that peer is mapped to the ISP A and 
peer Pb is mapped to the ISP B. All the traffic sent from Pa to 
Pb is considered as traffic uploaded by the ISP A to the ISP B 
with a peering link in the first scenario or with a transit link via 
the transit ISP in the second scenario. 

Our experiments are equivalent to what we would have ob- 
tained in the Internet with real ISPs and inter-ISP links except 
for latency. Rao et al. Jgt] showed that the latency would not sig- 
nificantly change our results because: i) we limit the upload ca- 
pacity on each BitTorrent client, thus the RTT is not the limiting 
factor for the end-to-end throughput; ii) the choking algorithm 
is insensitive to latency by design, as BitTorrent computes the 



throughput of neighbors (used to unchoke them) over a 10 sec- 
onds interval, which should alleviate the impact on BitTorrent 
of the TCP ramp up [10] due to latency. 

We experiment with and without bottlenecks in the network. 
By default, there is no bottleneck in the network because the 
aggregated traffic generated by our experiments is always sig- 
nificantly lower than the bottleneck capacity of the experimen- 
tation platform. However, we also create artificial bottlenecks 
on inter-ISP links to evaluate their impact on inter-ISP traffic 
and performances (see section fOl for the description of how we 
limit the inter-ISP link capacity). It is important to evaluate the 
impact of bottlenecks on inter-ISP links because the choking al- 
gorithm selects peers according to their throughput. Therefore, 
bottlenecks may significantly change BitTorrent's behavior. 

For each experiment, there is a single initial seed and we 
select at random in which ISP the initial seed is located. We 
decided to focus on the case of a single initial seed because Bit- 
Torrent locality is the most challenging when there are a single 
seed and many leechers. Indeed, in that case, inter-ISP commu- 
nications cannot be avoided. However, once there are multiple 
seeds spread in ISPs, the performance will be higher and the 
inter-ISP traffic will be lower. Therefore, whatever the seed 
distribution per ISP is, that will be a more favorable case than 
the case of a single initial seed we address in that paper. 

Finally, we have not considered a hierarchy of transit-ISPs. 
We show in sec tion lo31 that there is a huge amount of inter-ISP 
traffic generated by BitTorrent. Even if the proposed locality 
policy already significantly reduces this traffic, optimizations 
for the transit-ISPs still makes sense. We keep the detailed eval- 
uation of the optimization of the traffic in a hierarchy of transit 
ISPs for future work. 

3.2. Evaluation Metrics 

To evaluate our experiments, we consider three metrics: the 
content replication overhead, the 95th percentile, and the peer 
slowdown. 

Overhead For each stub-ISP, we monitor the number of 
pieces that are uploaded from this stub-ISP to any other stub- 
ISP during the experiment. Then, to obtain the per-ISP con- 
tent replication overhead, we normalize the amount of data up- 
loaded by the size of the content for the experiment. Thus, we 
obtain the overhead in unit of contents that crosses an inter- 
ISP link. We call this metric the content replication overhead, 
or overhead for short, because with the client-server paradigm, 
ISPs with clients only would not upload any byte. We use the 
overhead as a measure of load on peering links. 

95th Percentile To obtain the 95th percentile of the over- 
head, we compute the overhead by periods of 5 minutes and 
then consider the 5 minutes overhead corresponding to the 95th 
percentile. The 95th percentile is the most popular charging 
model used on the Internet ll ill . 

Slowdown We define the ideal completion time of a peer as 
the time for this peer to download the content at a speed equiva- 
lent to the average of the maximum upload capacity of all peers. 
This is the best completion time, averaged over all peers, that 
can be achieved in a P2P system in which each peer always 
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uploads at its maximum upload capacity. The slowdown is the 
experimental peer download completion time normalized by the 
ideal completion time. For instance, imagine that all peers have 
the same maximum upload capacity of 20kB/s. An average peer 
slowdown of 1 for 10000 peers means that there is an optimal 
utilization of the peers upload capacity, or that the peers are, 
on average, as fast as a client-server scenario in which we have 
10 000 servers, one server per client sending at 20kB/s. 

In the following, we give the average slowdown per ISP. Vari- 
ability of peers slowdown is inherent to the BitTorrent dynam- 
ics, but we do not want this variability (that is not shown by the 
average) to be worse with the locality policy than with the Bit- 
Torrent policy. Therefore, we validated for each experiment and 
each ISP that the deviation from the average slowdown for the 
slowest peers is similar with the locality and BitTorrent poli- 
cies (for brevity, we only show this validation for the case of 
churn, see section l6.2.2b . Thus, the average slowdown per ISP 
is enough to evaluate the impact of the locality policy on peers. 

4. Impact of the Number of Inter-ISP Connections 

The goal of this section is to explore the relation between the 
number of inter-ISP connections and the overhead and slow- 
down. In particular, we will evaluate how far we can push lo- 
cality (that is, how much we can reduce the number of inter- 
ISP connections) to obtain the smallest overhead attainable and 
what is the impact of this reduction on the slowdown. 

4.1. Experimental Parameters 

For this series of experiments, we set the torrent size to 1 000 
peers, the number of ISPs to 10, and the content size to 100 MB. 
Therefore, there are 100 peers per ISP in all the experiments of 
the first series. To analyze the impact of the number of inter-ISP 
connections on BitTorrent, we then vary the number of outgo- 
ing inter-ISP connections between 4 and 40 by step of 4, and 
between 400 and 3600 by steps of 400. As we consider, in this 
section, scenarios with the same number of peers for each ISP, 
the total number of inter-ISP connections per ISP will be on av- 
erage twice the number of outgoing inter-ISP connections. We 
run experiments for each of the three following scenarios. 

Homogeneous scenario with a slow seed In this scenario 
both the initial seed and the leechers can upload at a maximum 
rate of 20kB/s. As we have mentioned earlier, we run 100 leech- 
ers per node, and we run the initial seed and the tracker on an 
additional node. According to the definition of locality policy 
from section 12.11 each peer has the same probability to have a 
connection to the initial seed, whichever ISP it belongs to. For 
instance, as the initial seed has a peer set of 80, with 10 ISPs, 
each ISP has in average 8 peers with a connection to this initial 
seed. 

Heterogeneous scenario We experiment with leechers with 
heterogeneous upload capacities and a fast initial seed. In each 
ISP, one third of the peers uploads at 20kB/s, one third uploads 
at 50kB/s, and one third uploads at lOOkB/s. For simplicity, we 
run all the leechers with the same upload capacity on the same 
node. Because we have determined that the hard drives cannot 
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Figure 1: Overhead with 1 000 peers and 10 ISPs. Each square, circle 
and triangle represents the average overhead on all ISPs in a given sce- 
nario. Each dot represents this overhead for one ISP. A small number 
of outgoing inter-ISP connections dramatically reduces the overhead. 



sustain a workload higher than 2MB/s, we run only 20 clients 
per node. For BitTorrent to perform optimally, the initial seed 
uploads at lOOkB/s, as fast as the fastest leechers. Each peer 
has the same probability to have a connection to the initial seed, 
whichever ISP it belongs to. 

We experiment with heterogeneous upload capacities for 
three reasons. The first reason is that non-local peers may be 
faster than local ones so the local peers may unchoke inter-ISP 
connections more often than intra-ISP connections, thus mak- 
ing the reduction of the number of inter-ISP connections in- 
efficient to reduce inter-ISP traffic. The second reason is that 
local peers may be faster than non-local ones so inter-ISP con- 
nections may be rarely used to download new pieces, thus de- 
grading performances. The third reason is that in case of het- 
erogeneous upload capacities inside an ISP, if fast peers are 
those with the inter-ISP connections, slower peers may not be 
given pieces to trade among themselves, also degrading perfor- 
mances. 

Our main goal is to evaluate the impact of the natural clus- 
tering of peers in case of heterogeneous upload capacities 11211 . 
As Legout et al. Il 211 show that the three-class upload capacity 
scenario is enough to observe clustering among peers, we re- 
strict ourselves to this simple scenario. We discuss further this 
issue in section[8] 

Homogeneous scenario with a fast seed We experiment 
with leechers that upload at 20kB/s and an initial seed that up- 
loads at lOOkB/s. We run this additional experiment in order to 
understand whether the results obtained with the heterogeneous 
scenario are due to the fast initial seed or due to the heteroge- 
neous capacities of leechers. 

First, we evaluate the impact of the number of inter-ISP con- 
nections on overhead and 95th percentile. Then, we evaluate the 
impact of the number of inter-ISP connections on slowdown. 



4.2. Impact on Overhead 

We observe in Fig. Q] that for the two scenarios with a well 
provisioned initial seed, i.e., the homogeneous fast seed and the 
heterogeneous scenarios, the overhead increases linearly with 
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Figure 2: 95th percentile with 1 000 peers and 10 ISPs. Each square, 
circle and triangle represents the average 95th percentile on all ISPs 
for a given scenario. Each dot represents this 95th percentile for one 
ISP. A small number of outgoing inter-ISP connections dramatically 
reduces the 95th percentile. 



Figure 3: Slowdown with 1 000 peers and 10 ISPs. Each square, circle 
and triangle represents the average slowdown on all ISPs in a given 
scenario. Each dot represents this slowdown for one ISP. The peer 
slowdown remains reasonably low even for small number of outgoing 
inter-ISP connections. 



the number of outgoing inter-ISP connections. Indeed, when 
there is no congestion in the network and a uniform repartition 
of the upload capacity of peers in each ISP, the probability to 
unchoke a peer outside his own ISP is linearly dependent on 
the number neighbors this peer has outside his own ISP, thus 
it is linearly dependent on the number of outgoing inter-ISP 
connections. We evaluate the impact of network bottlenecks in 
section [531 

The BitTorrent arrows in Fig. Q] and [3] represent the value of 
respectively overhead and slowdown achieved by BitTorrent in 
the same scenario. Indeed, with 1 000 peers and 10 ISPs of 100 
peers, each peer has 10% of connections inside his own ISP 
with the BitTorrent policy. Therefore, with BitTorrent each ISP 
will have 7 200 inter-ISP connections, 3 600 of those connec- 
tions being outgoing. Thus BitTorrent corresponds to the case 
with 3 600 outgoing inter-ISP connections in our experiments. 

For all three scenarios, our locality policy enables to reduce 
by up to two orders of magnitude the traffic on inter-ISP links. 
Indeed, we see in Fig. Q] that for 3 600 outgoing inter-ISP con- 
nections, the case of the BitTorrent policy, the overhead is close 
to 90, and for 4 outgoing inter-ISP connections the overhead is 
close to 1 for all three scenarios. 

Surprisingly, we observe in Fig.Q]that between 400 and 2 000 
outgoing inter-ISP connections, there is a higher overhead for 
the homogeneous scenario with a slow seed than for the two 
other scenarios with a fast seed. Indeed, as there is a lower 
piece diversity with a slow seed, peers in a given ISP will have 
to use more their inter-ISP connections, thus a higher overhead, 
in order to download pieces that are missing in their own ISP. 
We do not observe the same issue with a fast seed because this 
fast initial seed is fast enough to guarantee a high piece diversity 
even for a small number of outgoing inter-ISP connections. 

We also observe a linear relation between the number of out- 
going inter-ISP connections and the 95th percentile as well as a 
significant reduction of the 95th percentile for a small number 
of outgoing inter-ISP connections in Fig. [2] However, we ob- 
serve that the 95th percentile for the heterogeneous scenario is 



much larger than for the two other scenarios. This is because in 
the heterogeneous scenario there are two third of the peers that 
are faster than 20 kB/s, which is the upload capacity of all the 
peers for the two other scenarios. Therefore, we see that even 
if the total amount of traffic crossing inter-ISP links is not sig- 
nificantly impacted by the distribution of the upload capacity of 
peers (see Fig. [TJ, this distribution might have a major impact 
on the 95th percentile that is used for charging traffic on transit 
links. 

In summary, we have shown that a small number of outgoing 
inter-ISP connections leads to a major reduction of the overhead 
and 95th percentile up to two order of magnitude. In addition, 
4 outgoing inter-ISP connections give the minimum attainable 
overhead of 1 . In the next section, we explore what is the impact 
of such a high reduction on the peers slowdown. 

4.3. Impact on Slowdown 

The most striking result we observe in Fig.[3]is that, whereas 
for 4 outgoing inter-ISP connections the overhead is optimal 
(only one copy of content uploaded per ISP) and reduced by 
two orders of magnitude compared to the BitTorrent policy, the 
slowdown remains surprisingly low. 

Indeed, Fig. [3] shows that the number of outgoing inter-ISP 
connections has no significant impact on peers slowdown for 
the two scenarios with a fast seed (heterogeneous and homoge- 
neous with a fast seed) and a negligible impact for more than 16 
outgoing inter-ISP connections for the homogeneous scenario 
with a slow seed. This result is remarkable when one considers 
the huge saving a small number of outgoing inter-ISP connec- 
tions enables for the overhead and 95th percentile. 

For the homogeneous scenario with a slow seed, the slow- 
down increases by at most 43% for 4 outgoing inter-ISP con- 
nections compared to the case with the BitTorrent policy. This 
increase is due to a poor piece diversity, which can be avoided 
by having a fast initial seed as shown by the two scenarios with 
a fast seed in Fig. [3] Moreover, even if a 43% increase is not 
negligible, it has to be considered as the worst case. Indeed, as 
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we will show in section 1531 in case of congestion on inter-ISP 
links, the slowdown may even improve with a small number of 
outgoing inter-ISP connections compared to the BitTorrent pol- 
icy, because that will foster peers to exchange with peers in the 
same ISP, thus avoiding congested paths. 

In conclusion, we see that the peer slowdown remains sur- 
prisingly low even for a small number of outgoing inter-ISP 
connections. 

5. Evaluation of 4 Outgoing Inter-ISP Connections 

We have seen in the previous section that a small number of 
outgoing inter-ISP connections dramatically reduces the over- 
head and 95th percentile, and that the slowdown remains low in 
most cases. 

Whereas this result is encouraging, one may wonder if it is 
possible to minimize the overhead while keeping the slowdown 
low in more complex scenarios. Therefore, we focus in the 
following on 4 outgoing inter-ISP connection^ which leads to 
the lowest attainable overhead in our experiments in section l4~2l 
and we evaluate the overhead and slowdown when we vary the 
characteristics of the torrent (torrent size and number of peers 
per ISP), or the characteristics of the network (limitation of the 
capacity of the inter-ISP links). 

We consider for the remainder of this paper the homogeneous 
scenario with a slow seed. Indeed, we did not observe a signif- 
icant impact of the heterogeneous upload capacity of the peers 
on our results in section [4] Moreover, it is hard if not impossi- 
ble to obtain a realistic upload distribution of peers per torrent 
when one do not control the BitTorrent clients run by peers. In- 
deed, we know three different methods to measure the upload 
capacity of a peer. They all suffer from fundamental flaws. All 
those flaws come from the fact that the most popular BitTor- 
rent clients limit the number of upload slots and the number 
of torrents they actively participate to according to the config- 
ured upload capacity. The first method uses the measurement of 
the HAVE messages, which gives the download speed of peers. 
However, the upload and download speeds are not correlated 
(clearly the aggregate upload speed must be equal to the ag- 
gregate download speed, but we cannot conclude on the upload 
speed distribution) because the number of upload slots depends 
on the configured upload capacity in the client, but the num- 
ber of download slots depends on the neighborhood. Therefore, 
measuring the download speed of a peer does not give much 
information on its upload speed. The second method consists 
in downloading data from peers, thus making a direct upload 
speed measurement. However, this measurement is for a single 
upload slot. In the //Torrent client for instance, a peer upload- 
ing at 20kB/s to another peer might, in fact, have between 3 
to 50 different upload slots, and might participate between 1 to 
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Figure 4: Overhead (upper plot) and slowdown (lower plot) for torrents 
with 100, 1 000 and 10000 peers and 10 ISPs in two scenarios: Bit- 
Torrent policy, locality policy with 4 outgoing inter-ISP connections. 
Each square and circle represents the average overhead (upper plot), 
or the average slowdown (lower plot) for a particular torrent size in 
a given scenario. Each dot represents this overhead (upper plot), or 
slowdown (lower plot) for one ISP. With 4 outgoing inter-ISP connec- 
tions, the overhead is close to one independently of the torrent size, 
which is at the cost of an increase by 30% in the slowdown. 



25 different torrents. Thus, the real upload speed of a peer is 
much different from what is measured in a single upload slot. 
The third method consists in using probing techniques to find 
the actual physical upload capacity of the peer. However, as we 
already discussed, even if the physical upload capacity is an up- 
per bound, the actual upload speed of a peer in a given torrent 
might be vastly different. 

In summary, for this second series of experiments, we con- 
sider a scenario with 4 outgoing inter-ISP connections, a con- 
tent of 100 MB, peers with homogeneous upload capacities, and 
a slow seecQ. Then, we vary the torrent size, the number of 
peers per ISP, and the inter-ISP link capacity. We vary only one 
parameter at a time per experiment. We consider, in this section, 
scenarios with the same number of peers per ISP. Therefore, on 
average, the number of incoming inter-ISP connections will be 
equal to the number of outgoing inter-ISP connections. 

In the following, we do not present results for the 95th per- 
centile, as they do not show any significant new insights com- 
pared to the results for the overhead. 

5.1. Impact of the Torrent Size 

In this section, we make experiments with torrents with 100, 
1 000, and 10 000 peers, and 10 ISPs. 

In Fig. [4] upper plot, we see that for a small number of out- 
going inter-ISP connections the overhead is close to one inde- 
pendently of the torrent size, whereas for the BitTorrent policy 
it increases linearly with the torrent size. 



1 The relation between the 4 number of outgoing inter-ISP connections and 
the 4 upload slots of each peer is not accidental. Indeed, each ISP needs to 
receive new pieces at the speed of the initial seed in order to keep the slowdown 
low fl3l . Therefore, the minimum number of inter-ISP connections must be 
equal to or larger than the number of upload slots of the peers (assuming a 
homogeneous upload speed for all peers). 



2 According to Legout et al. IT3l . a seed as fast as the fastest leechers is 
enough to have a good BitTorrent dynamics. Therefore, as taking a fast seed 
would have artificially decreased the slowdown, we decided to consider the 
slowest possible initial seed, i.e., 20kB/s in the homogeneous scenario. Indeed, 
a fast initial seed will contribute a larger upload capacity, thus a lower slowdown 
for the leechers. 



8 



Impact of the Number of Peers per ISP 

10000 i ' . — . , — , — 




10 100 1000 10000 

Number of peers per ISP (log scale) 





BitTorrent — = — 
4 connections — O— 
PM+4 connections A 




1 

f^l Fi 


A 

. - 4 



10 100 1000 10000 

Number ol peers per ISP (log scale) 

Figure 5: Overhead (upper plot) and slowdown (lower plot) with 10 000 
peers and 10, 100, 1 000, and 5 000 peers per ISP for two scenarios: 
BitTorrent policy, locality policy with 4 outgoing inter-ISP connec- 
tions. Each square, circle and triangle represents the average over- 
head (upper plot) or the average slowdown (lower plot) for a particular 
number of peers per ISP in a given scenario. Each dot represents this 
overhead (upper plot) or slowdown (lower plot) for one ISP. With 4 
outgoing inter-ISP connections, the overhead remains close to one al- 
most independently of the number of peers per ISP, but at the expense 
of an increase by at most 30% in the slowdown. 

For the torrent with 100 peers, as there are 10 ISPs, there 
are only 10 peers per ISP. This scenario is interesting because 
a locality policy only makes sense when there are enough peers 
inside each ISP to be able to keep traffic local. This scenario 
shows the gain that can be achieved for a small number of peers 
per ISP. With a torrent of 100 peers, we save 60% of overhead 
as compared to BitTorrent. With a torrent of 10000 peers, we 
save 99.8% of overhead as compared to BitTorrent. 

To see the impact of this dramatic overhead reduction on 
slowdown, we focus on Fig.|4]lower plot. We see that the slow- 
down is 8% higher than with the BitTorrent policy for a torrent 
with 100 peers. For 1 000 and 10000 peers, the slowdown is 
32% higher than with the BitTorrent policy. 

In summary, we observe that with 4 outgoing inter-ISP con- 
nections, the BitTorrent overhead is optimal and almost inde- 
pendent of the torrent size, which is at the cost of an increase 
by around 30% of the slowdown. 

5.2. Impact of the Number of Peers per ISP 

In this section, we evaluate 10, 100, 1 000 and 5 000 peers per 
ISP. To vary the number of peers per ISP, we vary the number 
of ISPs with a constant torrent size of 10 000 peers. Therefore, 
to obtain 10, 100, 1 000 and 5 000 peers per ISP, we consider 
1 000, 100, 10, and 2 ISPs. 

We observe in Fig. [5] lower plot that there are many outliers 
points for the scenario with 100 peers per ISP. In fact, this sce- 
nario is the only one in section [5] that creates partitions. There- 
fore, we also present the result of this experiment with the Par- 
tition Merging (PM) strategy presented in section 1231 Indeed, 
we see that the PM strategy solves the issue in Fig. [5] We note 
that the results for all the other experiments remain unchanged 
with the PM strategy, as they do not create partitions. A de- 



tailed evaluation of the PM strategy is performed in section l6~2l 
In the following, we only consider the results obtained with the 
PM strategy for the scenario with 100 peers per ISP. 

Fig. |5] upper plot shows that with 4 outgoing inter-ISP con- 
nections, the overhead remains close to 1 for any number of 
peers per ISP, whereas it increases linearly with the BitTorrent 
policy. However, this overhead is slightly higher for the scenar- 
ios with 10 and 5 000 peers per ISP. 

We also observe on Fig. [5] lower plot that the slowdown is 
close to the one of BitTorrent for 10 and 5 000 peers per ISP 
and around 30% higher than the one of BitTorrent for 100 and 
1 000 peers per ISP. This non-monotonic behavior is explained 
by the tradeoff that involves two main factors impacting the 
performance of BitTorrent in this scenario. On the one hand, 
as the initial seed has a maximum of 80 connections to other 
peers, at most 80 ISPs can have a direct connection to the ini- 
tial seed. All ISPs without direct connection to the initial seed 
have to get all the pieces of the content from other ISPs. There- 
fore, there is a higher utilization of the inter-ISP connections 
and a higher slowdown because the few inter-ISP connections 
available to guarantee a high piece diversity represent a bot- 
tleneck. On the other hand, when the number of peers per ISP 
decreases, the number of ISPs increases because the torrent size 
is constant, thus the global number of inter-ISP connections in- 
creases. Therefore, the overhead increases too, but the slow- 
down decreases because there is a sufficient number of inter-ISP 
connections to guarantee a high piece diversity. 

In summary, we observe that with 4 outgoing inter-ISP con- 
nections, the BitTorrent overhead is optimal and almost inde- 
pendent of the number of peers per ISP, which is at the cost of 
an increase by at most 30% of the slowdown. 

5.3. Impact of the Inter-ISP Link Capacity 

To explore the impact of inter-ISP link capacity, we consider 
torrents with 1 000 peers and 10 ISPs. We vary the inter-ISP 
link capacity from 40kB/s to lOOkB/s by steps of 20kB/s and 
from 200kB/s to 2 OOOkB/s by steps of 200kB/s. However, lo- 
cal peers can upload to their local neighbors (in the same ISP) 
at 20kB/s without crossing a link with limited capacity. For this 
experiment, all the BitTorrent clients that run on the same node 
are located in the same virtual ISP, so limiting the upload ca- 
pacity of the node is equivalent to limiting that inter-ISP link 
capacity. For an inter-ISP link capacity of 2 OOOkB/s, all the 
BitTorrent clients that are located on a same node can upload 
outside this ISP at their full capacity without any congestion. 
Therefore, it is equivalent to the case with no inter-ISP link 
bottleneck. We use the tool traffic controller (tc), that is part 
of the iproute2 package, to limit the upload capacity of each 
node on which we run experiments. We deploy our own image 
of GNU/Linux, on which we have superuser privileges, on all 
the nodes we want to limit the upload capacity. Limiting the 
upload capacity on each node allows us to reproduce Internet's 
bottlenecks in a controlled environment. 

We see in Fig. [6] upper plot that with 4 outgoing inter-ISP 
connections the overhead remains close to 1 .5 for any inter-ISP 
link capacity. For the BitTorrent policy, the overhead increases 
with the inter-ISP link capacity. The reason is that BitTorrent, 



9 



Impact of the Inter-ISP Link Capacity 




400 600 800 1000 1200 1400 
Inter-ISP link capacity (kB/s) 



1 800 2000 













BitTorrent — = — 












4 connections - 










































, $ i 


> i 


> i < 


s i ' 




1 h 


, 







200 400 600 800 1000 1200 1400 1600 1800 2000 
Inter-ISP link capacity (kB/s) 

Figure 6: Overhead (upper plot) and slowdown (lower plot) with 1 000 
peers and 10 ISP for various inter-ISP link capacities and two scenar- 
ios: BitTorrent policy, locality policy with 4 outgoing inter-ISP con- 
nections. Each square and circle represents the average overhead (up- 
per plot) or slowdown (lower plot) for all ISPs in a given scenario. 
Each dot represents this overhead (upper plot) or slowdown (lower 
plot) for one ISP. With 4 outgoing inter-ISP connections, the overhead 
is significantly reduced for any inter-ISP link capacity and, with con- 
gested inter-ISP links, a small number of outgoing inter-ISP connec- 
tions improves the slowdown compared to the case of the BitTorrent 
policy. 



due to the choke algorithm, will prefer to exchange data with 
local peers when there is congestion on the inter-ISP links, be- 
cause those local peers are not on a congested path, thus a larger 
BitTorrent download speed. For high inter-ISP link capacity, 
those links are no more congested, therefore the capacity does 
not impact anymore the overhead achieved by the BitTorrent 
policy. 

We observe in Fig.|6]lower plot that with congestion on inter- 
ISP links, a small number of outgoing inter-ISP connections im- 
proves the peers slowdown. Indeed, for an inter-ISP link capac- 
ity lower than 400 kB/s, the scenario with the BitTorrent pol- 
icy becomes slower than the scenario with 4 outgoing inter-ISP 
connections. The benefit of a small number of outgoing inter- 
ISP connections on the slowdown is significant for highly con- 
gested inter-ISP links. For an inter-ISP link capacity of 40kB/s, 
the scenario with with 4 outgoing inter-ISP connections is more 
than 200% faster than with the BitTorrent policy. 

In summary, the overhead is almost independent of the inter- 
ISP link capacity for 4 outgoing inter-ISP connections, whereas 
it significantly increases with the inter-ISP link capacity for the 
BitTorrent policy. In addition, when inter-ISP links are con- 
gested, we observe a lower slowdown with the locality policy 
than with the BitTorrent policy. We discuss the impact of this 
result in the next section. 

5.4. Discussion 

We have focused on 4 outgoing inter-ISP connections and 
showed that the overhead is close to 1 in most scenarios and 
almost independent of the torrent size, the number of peers per 
ISP, and the congestion on inter-ISP links. 

But, most surprisingly, the slowdown remains close to the 
one of the BitTorrent policy in most cases. In some scenarios, 



the overhead can be around 30% larger than with the BitTor- 
rent policy. Whereas an increase by 30% cannot be considered 
negligible, this is a very positive result for two main reasons. 

First, we remind that our main goal in this section was to 
minimize the overhead. We achieved up to three orders of mag- 
nitude reduction in the overhead compared to the BitTorrent 
policy (see Fig. |4] for a torrent with 10000 peers). There is a 
price to pay for such a huge reduction, which is an increase by 
at most 30% in the slowdown. We deem this increase to be rea- 
sonable considering the savings it enables. However, we have 
also run experiments with 40 outgoing inter-ISP connections 
that are not shown here for brevity, but that are available in a 
technical report H 1 3fl - We found that with 40 outgoing inter-ISP 
connections, the slowdown is always close to the one of BitTor- 
rent at the price of a small increase in the overhead that is close 
to 10 in most of the cases. However, even with this increase in 
the overhead, the savings compared to the BitTorrent policy are 
still huge, up to two orders of magnitude in our experiments. 

Second, the increase we report on the slowdown is the worst 
one that can be achieved. Indeed, all our experiments (except 
the ones presented in section 15.3b are performed without con- 
gestion in the network. However, we have shown in section 1531 
that in case of congestion, our locality policy can reduce the 
slowdown compared to the BitTorrent policy. Therefore, on a 
real network, the slowdown with our locality policy is likely 
to be equivalent or even better than the one of the BitTorrent 
policy. 

It is important to understand why BitTorrent still performs 
very well even in case of a small number of inter-ISP con- 
nections, i.e., with a high constraint in the connectivity among 
peers. We discuss here some avenues worth exploring on the 
possible reasons for the excellent resilience of BitTorrent to 
connectivity constraints. The rarest first piece selection strat- 
egy used in BitTorrent is known to guarantee a high piece di- 
versity fl 1411 - Therefore, in case of a small number of inter-ISP 
connections, the piece selection strategy succeeds to make the 
best use of the available inter-ISP capacity in order to download 
new pieces within the ISPs. Once new pieces are downloaded 
within an ISP, they are replicated fast because the connectivity 
among peers within an ISP is good. A simple way to look at 
this problem is to consider peers with connections outside their 
ISP as kind of proxies of the initial seed. Those proxies will re- 
quest the rarest pieces to their neighbors, so they will efficiently 
download new pieces. Then, those proxies will serve those new 
pieces to peers within their ISP, like a seed would do. As each 
ISP has on average four outgoing inter-ISP connections, that is 
four proxies, the aggregated capacity of service of those proxies 
is close to the one of the initial seed. In summary, the locality 
policy corresponds to a way to efficiently distribute the capacity 
of the initial seed in ISPs while minimizing the inter-ISP traffic. 

6. Real World Scenarios 

Up to now, we have defined scenarios intended to understand 
the evolution of the overhead and slowdown with a small num- 
ber of outgoing inter-ISP connections when one varies one pa- 
rameter at a time. Those scenarios are not intended to be realis- 
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tic, but to shed light on some specific properties achieved with 
a small number of outgoing inter-ISP connections. 

In this last series of experiments, we use real world data to 
build realistic scenarios. In particular, we will experiment with 
measured distribution of the number of peers per AS for real 
torrents. In the remaining of this section, we focus on inter- 
ASes rather than on inter-ISPs traffic for two reasons. First, 
the information to perform the mapping between IP addresses 
and ASes is publicly available, whereas there is no standard 
way to map IP addresses or ASes to ISPs. Second, ISPs may 
consist of several ASes. There is no way to find where an ISP 
wants to keep traffic local. Indeed, this is most of the time an 
administrative decision that depends on peering and transit rela- 
tions among its own ASes and the rest of the Internet. However, 
making the assumption, as we do, that ISPs want to keep traf- 
fic local to ASes is reasonable, even if there are some cases in 
which ISPs want to define locality at a smaller or larger scale 
than the AS level. Therefore, we believe that our assumption is 
enough to give a coarse approximation of the potential benefits 
of a small number of outgoing inter- AS connections at the scale 
of the Internet. 

In the following, we present the crawler we designed to get 
real world data. Then we present the results of experiments 
with real torrent characteristics. Finally, we give an estimation 
of the savings that would have been achieved using our locality 
policy on all the torrents we crawled. 

6.1. Description of the BitTorrent Crawler 

In order to get real world data, on the ll'' 1 of December 2008, 
we have collected 790717 torrent files on www.mininova.com 
that is considered one of the largest index of torrent files in the 
Internet. All those torrent files were collected during a period of 
six hours. Out of these 790717 torrent files, we have removed 
duplicate ones (around 1.65% of the files) and all files for tor- 
rents that do not have at least 1 seed and 1 leecher. Our final set 
of torrent files consists of 214443 files. 

We have implemented an efficient crawler that takes as in- 
put our set of torrent files and that gives as output the list of 
the peers in each of the torrents represented by those files. We 
identify a peer by the couple (IP,port) where IP is the IP address 
of the peer and port is the port number on which the BitTorrent 
client of this peer is listening. 

Our crawler, which consists of two main tasks, runs on a sin- 
gle server (Intel Core2 CPU, 4GB of RAM). The first task takes 
each torrent file sequentially. It connects first to the tracker re- 
questing 1 000 peers in order to receive the largest number of 
peers the tracker can return. Indeed, the tracker returns a num- 
ber of peers that is the minimum between the number of peers 
requested and a predefined number. The tracker returns a list of 
N peers, N usually ranging from 50 to 200. The tracker also re- 
turns the current number of peers in the torrent. Then, the task 
computes how many independent requests R must be performed 
in order to retrieve at least 90% of the peers in the torrent when 
each request results is N peers retrieved at random from the 
tracker. 

The second task starts a round of R parallel instances of a 
dummy BitTorrent client, each client started on a different port 
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Figure 7: Distribution of the number of peers per AS. Most ASes have 
few peers. 

number, whose only one goal is to get a list of peers from the 
tracker. Once a round is completed, the task removes all dupli- 
cates (iPport), makes sure that indeed 90% of the peers of the 
torrent were retrieved, and saves the list of couples (IP,port). In 
case, less than 90% of the peers were discovered during the first 
round, an additional round is performed. The second task can 
start many parallel instances of the dummy BitTorrent client for 
different torrents at the same time. As the task of the dummy 
client is simple, we can run several thousands of those clients 
at the same time on a single machine. 

At the end of this second task we crawled 214443 torrents 
within 12 hours, the largest torrents being crawled in just a few 
seconds, and we identified 6113 224 unique peers. 

Finally, we map each of the unique collected peers to the AS 
it belongs to using BGP information collected by the Route- 
Views project @] . We found that the unique peers are spread 
among 9 605 ASes. Even if this way to perform the mapping 
may suffer from inaccuracy lfl5ll iflql . it is appropriate for our 
purpose. Indeed, we do not need to discover AS relationship or 
routing information, we just need to find to which AS each peer 
belongs to. Even if some mappings are inaccurate, they will 
not significantly impact our results, as we consider the global 
distribution of peers among all ASes. Fig.Qshows the distribu- 
tion of peers for all torrents per AS. We observe that most ASes 
have few peers. 

This simple but highly efficient crawler enables to capture 
a representative snapshot, at the scale of the Internet, of the 
peers using BitTorrent to share contents the day of our crawl. 
There are, however, two limitations to our crawler. First, we 
only crawled torrents collected on mininova. Even if mininova 
is one of the largest repository for torrent files, it contains few 
Asian torrents. Therefore, that means that our results present a 
lower bound of the benefit that can be achieved with high lo- 
cality. Indeed, Asian torrents are usually large and, due to the 
geographical locality inherent to such torrents, spread among 
fewer ASes than an average torrent. Therefore, Asian torrents 
have a larger potential for locality than other torrents. Sec- 
ond, we are aware that a fraction of the peers advertised by 
trackers are fake peers. Indeed, copyright holders (or represen- 
tative) join torrents to monitor peers in order to issue DMCA 
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takedown notices to downloaders B17I1 . Also tracker operators 
may add fake peers in order pollute the information gathered 
by copyright holders. Finally, some peers are identified as de- 
viant, which means that they do not look like regular peers [18]. 
However, even if the amount of fake peers accounts for a few 
percents of the overall peers, considering the large amount of 
torrents and peers crawled, we do not believe those fake peers 
to significantly bias our results. 



6.2. Impact of Locality for a Real Scenario 

In section|5j we performed experiments with a homogeneous 
number of peers per AS. However, real torrents have an hetero- 
geneous number of peers per AS, which may adversely impact 
the overhead reduction we observed with a small number of 
outgoing inter-ISP connections. 

In order to evaluate the impact of a real distribution of peers 
per AS on our experiments, we selected three different torrents 
from our crawl with different characteristics. We call those 
three torrents the reference torrents. The first torrent, that we 
call torrent 7, is a torrent for a popular movie in English lan- 
guage. This torrent represents the case of torrents with a world- 
wide interest. It has 9 844 peers spread among 1 043 ASes, the 
largest AS consisting of 386 peers. The second torrent, called 
torrent 2, is a torrent for a movie in Italian language. This tor- 
rent has 4 819 peers spread among 211 ASes, the largest AS 
consisting of 2, 415 peers. This torrent is typical of torrent with 
local interest. In particular, this torrent spans less ASes than tor- 
rent 7, and the largest AS, belonging to the largest Italian ISP, 
represents more than half of the peers of the torrent. The last 
torrent, called torrent 3, is a torrent for a game. It has 996 peers 
spread among 354 ASes, the largest AS consisting of 31 peers. 
This torrent is used to evaluate middle sized torrents with few 
potential savings with a locality policy, as there are few peers 
per AS. 

6.2.1. Evaluation of ASes with Heterogeneous Number of Peers 
We have run experiments with the same parameters as the 
ones of the homogeneous scenario described in section |4~T1 In 
particular, we have the initial seed and all leechers that upload at 
a maximum rate of 20kB/s, and a content of 100 MB. However, 
we consider scenarios with the same number of ASes and peers 
per AS as the three real torrents considered. In the following, 
we focus on experiments performed with the characteristics of 
torrent 7, as the experiments with the characteristics of the two 
other torrents lead to the same conclusions. We also validated 
that the exact location of the initial seed (that is selected at ran- 
dom) does not significantly impact our results. Therefore, we 
present the results for a single run for torrent 1. 

Fig. [8] shows the overhead per AS, ordered by number of 
peers, for torrent 1. As expected, the overhead increases lin- 
early with the number of peers per AS for the BitTorrent policy 
(squares). 

We observe that the overhead for the scenario with 4 out- 
going inter-AS connections is one order of magnitude lower 
than the one of BitTorrent for the largest ASes. However, the 
overhead is still large for the largest ASes. In fact, due to the 
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Figure 8: Overhead for torrent 1. Each symbol (rectangle, triangle, cir- 
cle, plus, and cross) represents the average overhead for all ASes with 
the same number of peers for a given scenario. Each dot represents 
the overhead for a single AS. The overhead is the lowest with the 4 
outgoing inter-AS connections and the RR, or PM+RR strategies. 
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Figure 9: Slowdown for torrent 1. Each symbol (rectangle, triangle, 
circle, plus, and cross) represents the average slowdown for all the 
ASes with the same number of peers and for a given scenario. Each dot 
represents the slowdown for a single AS. The slowdown is the closest 
to BitTorrent with 4 outgoing inter-AS connections and the PM+RR 
strategies. 



heterogeneity in the number of peers per AS, as explained in 
section 12.21 the largest AS will have more incoming inter-AS 
connections than small ones. Therefore, large ASes will have a 
larger number of inter-AS connections, thus a larger overhead 
than small ASes. 

The solution to this problem is to use the Round Robin (RR) 
strategy introduced in section l2T2l Indeed, Fig.[8]shows that the 
overhead is significantly reduced with the RR strategy (cross) 
for large ASes without penalizing small ASes. However, we 
see in Fig. [9] that the slowdown for the largest ASes increases 
significantly compared to the other scenarios. Indeed, as the 
RR strategy spreads uniformly the incoming inter-AS connec- 
tions on all ASes, each AS will have on average 8 inter-AS 
connections in total (4 outgoing and 4 incoming). Therefore, 
for the largest ASes, only few peers will have an inter-AS con- 
nection. Once those peers leave the torrent after their comple- 
tion, the largest AS will become partitioned with a large number 
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of peers waiting for new pieces from the initial seed. Thus, a 
larger slowdown. 

To solve this issue, we made experiments with the Parti- 
tion Merging (PM) strategy that is supposed to repair partitions 
quickly (see section l23l l. Indeed, we see in Fig.|9]that the sce- 
nario with 4 inter- AS outgoing connections and the PM+RR 
strategies (plus) gives the best slowdown over all the scenario 
using a locality policy, close to the one of the BitTorrent policy. 
This significant improvement is at the cost of a small increase in 
the overhead, see Fig. [8] (plus), but the overhead remains up to 
two orders of magnitude lower than with the BitTorrent policy. 

To show that the PM strategy does not impact our results 
when there is no partition, we consider a scenario with 4 outgo- 
ing inter- AS connections and the PM strategy only. We see in 
Fig.[8]that the overhead of this scenario (triangle) is almost in- 
distinguishable from the scenario without the PM strategy (cir- 
cle). We observe in Fig.|9]that the slowdown for both scenarios 
is also indistinguishable. Therefore, the PM strategy does not 
bias our results by artificially increasing the number of inter- AS 
connections. 

In summary, the PM+RR strategies solve issues with real tor- 
rents and enable huge overhead reduction and a low slowdown. 
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Figure 10: Overhead (left plot) and slowdown (right plot) with churn 
of 60s or 6000s for torrent 1 in two scenarios: BitTorrent policy, lo- 
cality policy with 4 outgoing inter-AS connections. Each square and 
circle represents the average overhead (left plot) or average slowdown 
(right plot) on all ASes for a specific scenario. The error bars represent 
the minimum and maximum overhead on all ASes (left plot), and the 
minimum and maximum slowdown on all peers (right plot). Even with 
churn, a small number of inter-AS connections significantly reduces 
the overhead without increasing the slowdown. 



6.2.2. Evaluation of Churn 

In this section, we run all our experiments with the charac- 
teristics of torrent 1. In particular, we consider scenarios with 
the same number of ASes and peers per AS as torrent 1. To 
evaluate the impact of churn, we start a first set of 9 844 peers 
using a uniform random distribution within the first 60 seconds 
in a first experiment, and within the first 6 000 seconds in a 
second experiment. Then, when each of those peers completes 
its download, we start a new peer from a second set of 9 844 
peers. Hence, we model the three phases of a real torrent's life: 
flashcrowd, steady phase, and end phase lfl9ll . The first phase, 
the flashcrowd, occurs while all peers of the first set join the 
torrent. The second phase, the steady phase, occurs when the 
number of peers in the torrent remains constant. This is when 
peers in the first set start to complete and that peers in the sec- 
ond set are started to replace those peers in order to keep the 
torrent size constant to 9 844 peers. The last phase, the end 
phase, occurs at the end of the torrent's life, when the last peers 
complete their download and no new peer joins the torrent. This 
is when there is no more peers in the second set to compensate 
departure of peers. 

Large torrents, like torrent 1, represent the most challeng- 
ing scenario in case of churn. Indeed, small torrents will have 
just one to a few peers per AS. Therefore, as most connections 
among peers will be inter-AS, the locality policy will not signif- 
icantly constrain the peers connectivity graph. Consequently, 
this graph will be random, unlike with a large torrent whose 
graph is clustered per AS, thus a better robustness to AS isola- 
tion in case of churn. 

We see in Fig. [10] left plot that the maximum overhead is 
reduced by one order of magnitude with 4 outgoing inter-AS 
connections compared to the BitTorrent policy. Moreover, the 
locality policy does not significantly degrade the average and 



maximum slowdown and even improves the minimum slow- 
down as shown by Fig. [10]right plot. 

In summary, even with churn the overhead is reduced and the 
slowdown remains low independently of the churn period with 
4 outgoing inter-AS connections. We have also run many other 
experiments with churn, not shown for brevity. We considered: 
i) churn on a 600s window with real distributions of peers per 
AS; ii) churn on 60s, 600s, and 6000s windows with a homo- 
geneous distribution of peers per AS with 10000 peers and 10 
or 1 000 ASes II 1311 . All those additional experiments confirm 
our conclusion. Finally, addressing more complex churn sce- 
narios, e.g., considering non uniform arrival pattern of peers, 
is an interesting subject that will be best evaluated using a real 
deployment. Therefore, we keep this problem for future work. 

6.3. Estimation of Locality Benefits at the Scale of the Internet 

In this section, we want to estimate the benefits our locality 
policy would have had on the torrents we crawled. In our crawl, 
1 17 677 torrents and 6 643 ASes cannot benefit from a locality 
policy, because there is at most one peer per AS per torrent. 
However, we want to show that despite most of the torrents and 
ASes cannot benefit from a locality policy, the implementation 
of a locality policy at the scale of the Internet would be highly 
beneficial. 

In order to make the estimation of the benefits of our locality 
policy, we make two assumptions. First, we estimate the inter- 
AS traffic in all the torrents we crawled by assuming that all the 
peers we found start downloading the content at the same time 
and stay connected to the torrent for the entire duration of their 
download. Indeed, we have not captured temporal information, 
which means that we do not know how long each peer stayed in 
each torrent. However, it is hard to know if we underestimate 
or overestimate the potential for locality of those peers. Indeed, 
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for torrents in a flash crowd phase, most peers are leechers and 
the population increases with time. For those torrents, we are 
likely to underestimate the benefits of our locality policy. For 
torrents in an end phase, most peers are seeds and the popu- 
lation is decreasing, therefore, it is likely that we overestimate 
the benefits of our locality policy. We believe our assumption 
to be reasonable and to provide, on average, at least a coarse 
estimation of the inter- AS traffic generated by all the peers we 
crawled. 

Second, we assume that peers have the same probability to 
exchange data with any peer in its peer set. Therefore, we as- 
sume that peers have the same upload capacity and that there 
is no network bottleneck that bias the peer selection with the 
choke algorithm. Here again, it is hard to assess the exact 
impact of this assumption on the accuracy of our results, but 
we believe that, considering the large number of torrents we 
crawled, our estimation of the inter- AS traffic is reasonable. 

We have explored, in section [4] a scenario with three classes 
of upload capacity spread uniformly over all peers. We have 
shown that the results obtained for this scenario do not signif- 
icantly differ from a homogeneous scenario. However, we did 
not explore scenarios with realistic peers upload capacity dis- 
tribution. In fact, it is hard, if not impossible, to obtain this 
information at the scale of the Internet (see section [5}. The 
clustering of peers observed by Legout et al. IU2I1 shows that 
when peers in a given ISP have similar upload capacity, they 
will automatically keep the traffic local, because they will clus- 
ter together. However, it is unlikely, due to the distribution of 
peers within a large number of ISPs, that the tracker will return 
many peers within a same ISP when there is no locality policy. 
Therefore, even in that case, it is unlikely that the traffic matrix 
will be much different from the one for which we assume the 
same upload capacity for all peers. In the case peers have het- 
erogeneous upload capacities within ISPs, we can assume that 
connections among ISPs will be performed at random, as when 
there is no assumption on the peers upload capacity. 

Moreover, the clustering observed by Legout et al. IU2I1 ap- 
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pears only when there is high piece diversity. As soon as piece 
diversity becomes lower, there is no more clustering even if 
the efficiency of BitTorrent is preserved. This is this kind of 
phenomenon we observe with locality. Indeed, even if upload 
distribution is supposed to foster communications among peers 
with the same upload capacity, this is by no mean an absolute 
constraint because, as soon as piece diversity decreases, clus- 
ters among peers are broken and any peer can communicate 
with any other peer . 

In summary, we believe that our findings will not be funda- 
mentally different by taking into account peers upload capacity. 

In order to estimate the benefits of our locality policy, we 
first estimate the inter-AS traffic generated with the BitTorrent 
policy, then we estimate the overhead savings enabled by our 
locality policy. 

To estimate the inter-AS traffic generated by the torrents we 
assume that the probability that a peer in a given AS will upload 
data to a peer in another AS is only a function of the number of 
inter-AS connections of the ASes and that there is no conges- 
tion on inter-AS links. In particular, for a torrent of size S j, an 



Torrent 1 I 
Torrent 2 c 


1 
) 




--0 -i 


Torrent 3 • 












Model torrent 1 






Model torrent 2 
Model torrent 3 ■ 



10 100 1000 

Number of peers per AS (log scale) 



0.8 
0.7 
0.6 
0.5 
0.4 
0.3 





pi-tt" icpo 'o o ■ 


























Torrent 1 I 


e 

e 




Torrent 2 o 











1 10 100 1000 10000 

Number of peers per AS (log scale) 

Figure 11: Overhead for the three reference torrents with the BitTor- 
rent policy fitted with the estimation of this overhead using a simple 
model (upper plot), and overhead savings with 4 outgoing inter-AS 
connections with PM+RR compared to BitTorrent policy for the three 
reference torrents (lower plot). The simple model gives a good esti- 
mation of the overhead for the BitTorrent policy; the overhead savings 
per AS depend mainly on the number of peers per AS. 



AS A of size S a, and a content of size C, the inter-AS traffic up- 
loaded from A is (1 - Sa C. While this model is simple, we 
see in Fig. [TTIupper plot that it matches well the inter-AS traffic 
uploaded from each AS that we measured for the three refer- 
ence torrents. Then, for each AS and each torrent, we compute 
using the simple model the inter-AS traffic. 

To estimate the inter-AS traffic generated by the torrents we 
crawled with the locality policy with PM+RR, we use the over- 
head savings we obtained with experiments with the three ref- 
erence torrents. Indeed, we see in Fig. QT| lower plot, that the 
overhead savings of our locality policy with PM+RR compared 
to the BitTorrent policy depends on the number of peers per 
AS, but not on the torrent size. Therefore, we use the average 
overhead savings computed on the three reference torrents for 
each AS size to compute the reduction of inter-AS traffic. We 
also made the same experiments without the PM+RR strategy 
to estimate the inter-AS traffic with our locality policy without 
those strategies, and we observed that the savings depend on 
the number of peers per AS, as well. 

Fig. [TT] lower plot shows that even with a small number of 
peers per AS, the overhead savings are already high. For in- 
stance, with 5 peers per AS, the overhead with our locality pol- 
icy is 40% lower than the one with the BitTorrent policy. 

Now, we focus on the impact of those savings at the scale of 
all the torrents we crawled. We see in Fig. [T2]upper plot the cu- 
mulative inter-AS traffic for each torrent we crawled. The 100 
(resp. 10000) largest torrents generate 26% (resp. 82%) of the 
inter-AS traffic. The ideal policy corresponds to the inter-AS 
traffic generated when only one copy of the content is uploaded 
per AS and per torrent. We see that the cumulative inter-AS 
traffic with the BitTorrent policy is 11.6 petabytes, and that 
with 4 outgoing inter-AS connections it is 7.3 petabytes (and 
7 petabytes with the PM+RR strategies), which is only 41% 
larger (35% with PM+RR) than what the ideal policy achieves. 
Therefore, our locality policy enables a significant reduction of 
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Figure 12: Estimation of the cumulated inter-AS traffic for all torrents 
in terabytes (upper plot) and inter-AS traffic per AS in terabytes (lower 
plot). Significant savings can be done at the scale of the Internet using 
a locality policy. 



the inter-AS traffic at the scale of the Internet. 

The 50 (resp. 300) largest ASes represent 45% (resp. 84%) 
of the total inter-AS traffic. Interestingly, we see in Fig. [12] 
lower plot, that the ASes with the largest inter-AS traffic are 
also the ones that benefit from the most significant inter-AS 
traffic reduction with our locality policy. We checked manu- 
ally the 50 largest ASes to make sure that they do not belong to 
copyright holders (or piracy tracking companies) to be sure that 
most of the peers in those ASes are real peers 1 1811 . 

In summary, a high locality policy can reduce by up to 40% 
the inter-AS traffic for the 214443 real torrents we crawled 
spread across 9 605 ASes. 



7. Related Work 

Karagiannis et al. |l|] first introduced the notion of locality 
in the context of P2P content replication. They show monitor- 
ing the access link of an edge network and running simulations 
using a log collected from a BitTorrent tracker for a single tor- 
rent JI9I1 that peer-assisted locality distribution is an efficient 
solution for both the ISPs and the end-users. 

P4P |3] is a project whose aim is to provide a light-weight in- 
frastructure to allow cooperation between P2P applications and 
ISPs. Xie et al. presented small scale experiments (with be- 
tween 53 and 160 PlanetLab nodes) on two specific scenarios. 
They also reported on a field test experiment around 60% of 
inter-ISP traffic savings with P4P for a single ISP and a single 
large torrent. 

Aggarwal et al. I20I1 present an architecture that is similar 
by some aspects to P4P. The authors define the notion of oracle 
that are supplied by ISPs in order to propose a list of neighbors 
to peers. They perform their evaluation on Gnutella using sim- 
ulations and small scale experiments with 45 Gnutella nodes. 

Another approach that requires no dedicated infrastructure 
is Ono J3]. Ono clusters users based on the assumption that 
clients redirected to a same CDN server are close. The authors 
have developed an Ono plugin for the Vuze client. The authors 
reported measurement results collected from 120000 users of 



the Ono plugin over a 10 month period. They reported up to 
207% performance increase in average peer download comple- 
tion time. However, the authors did not give an explicit inter- 
ISP traffic reduction, but showed a reduction of the path length 
between peers in terms of IP and AS hops. 

Bindal et al. I21I1 present the impact of a deterministic local- 
ity policy on ISPs' peering links load and on end-users experi- 
ence. The authors considered simulations on a scenario with 14 
ISPs with 50 peers each, thus a torrent of 700 peers. 

Lin et al. 12211 introduce ELP that aims to keep traffic local to 
ISPs. They provide a model that gives bounds on the inter-ISP 
traffic, and they validate ELP experimentally on PlanetLab with 
a maximum of 60 peers. 

In contrast to previous works like P4P 12] and Ono J3l that 
provide very valuable results focusing on in the wild measure- 
ments or deployments, our work fills a gap by providing a sys- 
tematic and rigorous evaluation of BitTorrent by doing con- 
trolled experiments. Indeed, our work significantly differs from 
those previous ones, by being the first one to extensively eval- 
uate the impact of key parameters like the number of inter-ISP 
connections, the torrent size, the distribution of peers per ISP, 
the inter-ISP bottlenecks, the churn rate, and the peers upload 
capacity using large scale experiments and real world data. In 
particular, we considered 214443 real torrents spread across 
9 605 ASes (it was a single large torrent and a single AS for 
the P4P field test H) and showed that using only four inter-ISP 
connections (it was 20% of inter-ISP connections for the P4P 
field tests) we could reduce the inter-ISP traffic at the scale of 
the Internet by up to 40%. 

Piatek et al. 12311 discuss pitfalls for an ISP-friendly local- 
ity policy and ISPs traffic engineering constraints. In partic- 
ular, Piatek et al. discuss three main issues: client side only 
localization might not work, localization might adversely im- 
pact robustness and efficiency, ISPs have conflicting interests. 
The two first issues do not apply to our work as we consider 
a tracker based locality policy, and as we have designed and 
evaluated the partition merging strategy to prevent robustness 
issues. Concerning the last issue on ISPs conflicting interests, 
it is beyond the scope of this study to evaluate the economical 
benefit for tier-1 ISPs to keep traffic local. Our work shows that 
if ISPs want to apply a locality policy to BitTorrent traffic, it 
is doable and it will significantly reduce the traffic on inter-ISP 
links. 

Cuevas et al. 12411 is by several aspects close to our work. 



Indeed, the authors also collected a large BitTorrent trace and 
explored the impact of high locality. However, their work sig- 
nificantly differs by other aspects. The core of their evaluation 
study is a mathematical model, whereas we performed exten- 
sive large scale experiments. They specifically focused on peers 
upload distribution, whereas we focused on the systematic eval- 
uation of fundamental parameters like torrent size, or inter-ISP 
bottlenecks. Finally, they do not explore the second question 
of this work What is, at the scale of the Internet, the reduction 
of traffic that can be achieved with locality? In summary, our 
work is complementary to the one of Cuevas et al., as it val- 
idates some of the assumptions they made in the modeling of 
BitTorrent locality, in particular the good piece diversity with 
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locality, which is fundamental to observe stratification in their 
model. 

8. Conclusion 

Our work is intended to be complimentary to previous works 
HI [3l|3,|23t] by answering the two fundamental questions: How 
far can we push BitTorrent locality? What is at the scale of the 
Internet the reduction of inter-ISP traffic that can be achieved 
with locality? 

In this paper, we have performed an extensive evaluation 
of the impact of a small number of inter-ISP connections on 
overhead and slowdown. We have run experiments with up to 
10000 real BitTorrent clients in a variety of scenarios, includ- 
ing scenarios based on real data crawled from 214443 torrents 
representing 6113 224 unique peers spread among 9 605 ASes. 

Our main findings are that a small number of inter-ISP con- 
nections will dramatically reduce the overhead and keep the 
slowdown low independently of the torrent size, the number 
of peers per ISP, the upload capacity of peers, or the churn. 
We have introduced two new strategies called Round Robin and 
Partition Merging that make the use of a small number of inter- 
ISP connections feasible for real torrents of the Internet. 

However, we do not advocate for such small number of inter- 
ISP connections in real deployments. Indeed, selecting a very 
small number of inter-ISP connections is a design choice. For 
BitTorrent client companies, it is not an option to increase the 
peers slowdown, even by just a few percents, in order to fur- 
ther reduce the load on inter-ISP links, because it will lead to a 
decrease in the number of users of this BitTorrent client. Con- 
versely, content providers, might want to optimize the inter-ISP 
traffic in order to convince ISPs to do not block their BitTorrent 
traffic. 

In this work, we intend to increase confidence in BitTorrent 
locality by showing that even in case of high locality BitTorrent 
still performs extremely well, and that with high locality the 
inter-ISP traffic reduction can be up to 40% on the torrents we 
crawled, which is 4.6 petabytes of data. 
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